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Abstract 

We approximate the distribution of the sum of independent but not 
necessarily identically distributed Bernoulli random variables using a 
shifted binomial distribution where the three parameters (the number 
of trials, the probability of success, and the shift amount) are chosen to 
match up the first three moments of the two distributions. We give a 
bound on the approximation error in terms of the total variation metric 
using Stein's method. A numerical study is discussed that shows shifted 
binomial approximations typically are more accurate than Poisson or 
standard binomial approximations. The application of the approxima- 
tion to solving a problem arising in Bayesian hierarchical modeling is also 
discussed. 

1 INTRODUCTION 

A common method for improving the accuracy of an approximation is the 
construction of an asymptotic expansion. In practice, however, this can be 
more time consuming and much less convenient than calculating the values of 
a known distribution. An alternative approach is thus to modify a common 
approximating distribution by introducing some new parameters which then 
can be used to achieve a better fit. The use of common distributions can make 
it easy to avoid the need for specialized programming when using standard 
statistical packages to model data. 



'Vilnius University, Faculty of Mathematics and Informatics, Naugarduko 24, Vilnius 
LT-03223 

t (Corresponding Author) Boston University School of Management, 595 Commonwealth 
Avenue, Boston, MA 02215, Phone: 617-353-2676, Email: pekoz@bu.edu 
"'"National University of Singapore, 2 Science Drive 2, 117543 Singapore 
§ Center for Organization, Leadership and Management Research, Veterans' Health Ad- 
ministration, Boston, MA, and Boston University School of Management, 595 Common- 
wealth Avenue, Boston, MA 02215 



1 



One of the simplest modifications is shifting, and this approach works well 
in the Poisson case. For a large number of independent rare events, the dis- 
tribution of the number of them that occur is often well approximated by a 
Poisson distribution. If some of the events are not in fact so rare, this approx- 
imation is likely to be poor: the expected number of events occurring may not 
be close to the variance, but these are equal for the Poisson distribution. One 
easy way to address this problem is to introduce a shift by adding or subtract- 
ing a constant from the Poisson random variable. This then gives essentially 
two parameters that can be fitted to match the first two moments (subject to 
the constraint that the shift is an integer). Shifted (also referred to as trans- 
lated or centere d) Poisson approximation has be e n studied in many papers: 
see, fo r examp le. Cekanavicius and Vaitkus ( 2001 ). Barbour and Cekanaviciusl 
(|2002h . lRolli3 (|2005h .l Barbour and Lmdvalll (2006), and references therein. 

One of the goals of this paper is to investigate the effect of shifting applied 
to a two-parameter distribution. It is clear that shifting changes a distribu- 
tion's mean but not its variance and higher centered moments. Can we expect 
by shifting to conveniently obtain a three-parameter distribution and match 
three corresponding moments? In the case of normal approximation, the obvi- 
ous answer is no. The normal distribution already has a parameter which can 
be treated as shifting. Since both parameters of two-parameter distributions 
are usually closely related to their first two moments, it seems important to 
show that there are natural cases where shifting can be successfully applied. 
Below we use shifted (centered, translated) binomial approximation for the 
sum of Bernoulli variables. Our primary interest for the statistical application 
we consider is in the case when the variables are independent. 

In the literature, the distribution of the sum of independent Bernoulli ran- 
dom variables with not-necessarily-identical probabilities is called a Poisson- 
binomial distribution. This distribution is widely applicable and widely stud- 
ied, and bound s on approxima t ion er rors for various approximations have been 
developed. See IChen and Lh] dl997h for an overview of the Poisson-binomial 
distribution andTPitmanl dl997) for a pplications, as well as Le Cam ( I960! ) and 



Barbour. Hoist, and Janson ( 1992bl ) for some Poisson approximation results. 



A number of researchers have studied the binomial distribution as an approx 



i matio n for the Poisson-binomial distribution. For example, IChoi and Xia 



(|2002j ) argue that binomial approximations are better than Poisson approxi- 
mations. 

Before discussing some previously obtained results, we need to introduce 
some necessary notation. Let X±, . . . X m be independent Bernoulli random 
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variables with P(Xj = 1 

m 



^ w = E£i^i- Let 



1,2,..., cr 2 = VarVF = Ai - A 2 . 



i=l 



The total variation metric distance between two random variables X and Y is 
defined as 

d T v(3?{X),3?(Y)) =sup|F(X G A)-P(y e A) | 

A 

where the supremum is taken over all Borel sets. Note that, if X and Y are 
integer-valued, then d T v(X,Y) = \ J2 ie % \ P(X = i) - P(Y = We also 
define a local metric 

d loc (J?(X), ££ (Y)) = sup | P[X = j] - P[Y = j]\ 

Th e notation | • | and {•} is used for integral and fractional parts, respectively. 
Ehml (|l99lh gives results for binomial approximation where the number of 



trials equals the number of Bernoulli variables and the success probability is 
chosen to match up the first moment. More precisely, 



d T v(j?(W),Bi(m,p)) 



< 



1 — p 



m+l 



(1-P)' 



m+l 



(m + l)p(l — p) 



i=i 



(1.2) 



where p = \\jm. Thus, the binomial approximation here is one-parameter. 
Ehm' s appr oach was later extend to Krawtchouk asymptotic expansion by 

iRood (l2000h. 

barb our et al. ( 1992bl . p. 190) treated the binomial distribution as a two- 
param eter approximation. Their result was improved bv lCekanavicius and Vaitkus 
( 200ll . Section 4), who showed that 



d TV (j?(W),Bi(n,p)) 



< 



1 — p 



min ( 1 



a 



A3 
Ai 



A| 
A? 



+ 



A 2 {Af/A 2 } 
Ai(l - p)n 



+ P(W > n) 



(1.3) 



Here n = I A 2 / A 2 I , V = ■ X-\/n. Not e that Cekanavicius and Vaitkusl (|200ll ) (as 
well as Barbour et al. I (|l992bl l and [Soon] (|1996l )) in formulations of their results 
overlooked the term P(W > n), which is necessary because the support of W 
is typically larger than the support of the approximating binomial distribution. 

It is easy to see that both estimates (II. 2ft and (ll.3j) are small if all are 
close to each other. On the other hand, the second estimate can be sharper 
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than the first one. Indeed, let pi = 1/2 for i < m/2 and pi = 1/3 otherwise. 
Then the right-hand side of (jl.2p equals some absolute constant C±, mean- 
while the right-hand-side of (|1.3p after application of Chebyshev's inequality 
becomes C^m -1 / 2 . 

Note that two-paramete r binomial a pprox imations are also applied in set - 



tings with d ependence, see Soon ( 19961 ) and Cekanavicius and Roos ( 20071 ). 



Rollin ( 20081 ) used a shifted Bi(n, 1/2) to approximate sums of locally depen- 



dent random variables. 

In this article we study shifted binomial approximation where the shift, the 
number of trials, and the success probability are selected to match up the first 
three moments of the shifted binomial and the Poisson-binomial. We then give 
an upper bound on the approximation error by adapting Stein's method to the 
shifted binomial distribution. This is — to the best of our knowledge — the first 
time Stein's method is used to approximate by a distribution that fits the first 
three moments. We also discuss the results of a numerical study showing that 
a shifted binomial approximation is typically more accurate than t he Po isson 
or th e othe r standard binomial approximations discussed in [Soon (|l996h and 



Ehml (I199lh . 



At the end of the article we describe the motivating statistical applica- 
tion in health-care p rovider profiling that led to the need for a mor e accu rate 
approximation. See Pekoz. Shwartz. Christiansen, and Berlowitz (2009) for 

more detail on the application. An introduction to the use o f Bayesian hierar- 

chical models for healthcare provider profiling can be found in lAsh. Shwartz. and Pekoz 
(|2003h . 

St ein's m ethod was introduced in the context of normal approximatio n by 



Stein 



Chen ( 



1972 ) and developed for the Poisson distribution by Chen ( 19741 ) and 



19751 ). The method is particularly interesting since results in the com- 



plex setting of dependent random variables are often not much more dif ficult to 
obtain than results for independent variables. iBarbour et al.1 (|l992bh d etails 



how the m ethod can be applied to Poisson approximations, 



Ehml dl99lh and 



Lohl (|1992l ). r espectively, apply the me t hod to bin omial and multinomial ap- 



proxim ations, Barbour. Chen, and Loh ( 1992al) and Barbour and C 



irvssaphinou 



(|200ll ) to compound Poisson approximatio n, IBarbour and Brown 



(199 



ap 
2) 



to 
and 



Poisson process approximation, Pekoz ( 19961 ) to geometric approximation 
discussion of the many other distribu tions and settings the tech niqu e can be 
applie d can be found in, for example, Barbour and Chen ( 20051 ) and Reinert 
( 200 5]) . An elementary intr oduction to Stein's method can be found in Chapter 
2 of lRoss and Pekdzl (fcoOTt l. 

This paper is organized as follows. In Section 2 we give the main approxi- 
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mation theorems by adapting Stein's method to the shifted binomial distribu- 
tion and in Section 3 we prove these results. In Section 4 we discuss numerical 
results illustrating the accuracy of several approximations, and in Section 5 
we discuss the statistical application in Bayesian hierarchical modeling that 
motivated our initial interest in this approximation. 



2 MAIN RESULTS 

Let Y be a shifted binomial random variable with parameters n, p and integer 
shift s, that is, 

Y ~ Bi(re,p) *5 S (2.1) 

where * denotes convolution of measures and 5 S the measure with mass 1 at s. 
In this paper we study the approximation of W using Y with parameters n, p 
and s chosen so that the first three moments of W and Y are approximately 
equal. Due to the integer nature of n and s, it will not always be possible to 
exactly match the three moments — so we match them as closely as possible. 
We first estimate these parameters, and then give a theorem bounding the 
approximation error. It is easy to check that 

E l Y = np + s, Vary = np(l -p), E(Y - El") 3 = (1 — 2p) Var Y, 
EW = Ai, VarVF = Ai-A 2 , E(W - EIY) 3 = Ai - 3A 2 + 2A 3 . 

In order to find the values n, p and s, that match the moments best under 
the constraint on n and s are integer valued, let us first solve the system of 
equations EW = EY, VarTY = VarY, E(W - EIY) 3 = E(Y - EY) 3 for 
real- valued n*, p* and s*. The system of equations 

s* + n*p* = Ai, 
nV(l-P*)=Ai-A 2 , 
nV(l-P*)(l-2p*) = Ai-3A 2 + 2A3, 



yields the solution 

A 2 - A 3 # Ai — A 2 * \ * * /o \ 

p =- — , n = — r, s = Ai — n p . (2.2) 

Ai-A 2 ' p*(l-p*Y if v j 

We choose now 

I * I , *, n*p* + {s*} t {n*}p*+{s*} 

n = I n J , s = I s \ , p = = p H 

n n 

(in the last expression we indeed divide by n and not by n*) and then let 
Y be as in (|2.ip . Although p is real valued and therefore does not need any 
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rounding correction with respect to p* , a small perturbation is still necessary 
in order to fit the mean exactly, which is crucial to obtain better rates of 
convergence. For convenience, whenever we use a variable p (or p^, p* etc.) 
to denote a probability, the variable q (or q* etc.) will denote the counter 
probability 1—p. Let v = Ylu=i(Pi ^Qi)- Then our main result is the following. 

Theorem 2.1. Suppose X±, . . . ,X m are independent Bernoulli random vari- 
ables with F(Xj = 1) = Pi. With the definitions above, we have 

d T v(^(W),Bi(n,p)*5 s ) <K(4A 1 +2A 2 ) + rj, (2.3) 



where 



K = P ^ Q , (2.4) 



V 



a 2 (A 3 - A 4 ) - (A 2 - A 3 ) 2 _ Ax[K} + {«*}] +"{«*} 

<r 2 (l V(u/2-l)) 2 n 

(s max pj) A e~ a ^ + ((m — n — s) max pA A e~ a / 4+1 . 

i<s i>n+s 



Furthermore, 

d loc (j?(W),Bi(n,p)*5 s ) KKiSAs+AA^ + n, (2.5) 

where 

a 2 (A3-A4)-(A 2 -A 3 ) 2 Ai[{n*} + { S *}]+n{ S *} 

A3 = — , A4 



f7* 



^(lV( W /3-2)) 3/2 ' n(lV(«-l))V3 



If pi,P2, ■ ■ ■ ,Pm are such that for some fixed p we have, for all i, that either 
Pi = p or pi = 1, then and V have the same shifted binomial distribution 
and dTv(^(W), j£?(Y)) = 0. In this case after omitting 77 the right-hand 
sides of (|2.3[) and (|2.5[) also both equal zero. Dropping negative terms, using 
(A3 — A4) < a 2 < v and 1 V (av — b) > av/(l + b), and replacing all fractional 
parts by unity we obtain the following simplified bounds. 

Corollary 2.2. Under the conditions of Theorem \2.1\ we have 

d TV (J?(W),Bi(n,p) * S s ) < 17 + ^ in '' + 2e- CT2 /4+i 5 

and 

d loc (j?(W),Bi(n,p)*5 s ) < 222 + 1 ^ 2 in ~ 1 +2e-- 2 / 4+1 . 
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It is clear from this corollary that when c < p, < d for all i and for some 
absolute constants c, d, the order of upper bound on g?tv {-^{W), Bi(n,p) * 
<5 S ) is C^n" 1 ) while for d loc (jSf (W), Bi(n,p) * <5 S ) it is 0(n- 3 / 2 ). Thus, we 
obtain a significant improvement over 0(n -1 / 2 ) which can be obtained by two- 
para metric binomi al approximation fjl .3[> or by a shifted Bi(n, 1/2) distribution 
as in lRollinl (j2008h . 



3 PROOF OF THE MAIN RESULTS 

If Stein's method for normal approximation N(0, a 2 ) is applied to a random 
variable X, we typically need to bound the quantity 

na 2 f'(X)-Xf(X)} (3.1) 

for some specific functions /, were X is assumed to be centered and VarX = 
a 2 . This corresponds to fitting the first two moments. If three moments 
have to be matched, we need a different approximating distribution and a 
canonical candidate would be a centered T(r, A) distribution. This would lead 
to bounding the quantity 

E [(rA- 2 + \- 1 X)f'(X) - Xf(X)] , (3.2) 



(c.f. ILukl (|1994l . Eq. (17))) where the parameters r and A are chosen to fit the 
second and third moments of W, that is, Var W = r\~ 2 and EIF 3 / Var W = 
2A _1 (this obviously is only possible if W is skewed to the right, which we 
can always achieve by considering either W or —W). One can see that (|3.2p 
is in some sense a more general form of (|3.ip . having an additional parameter 
for skewness. On the integers, we can take a shifted binomial distribution as 
in this article. Not surprising, the Stein operator for a binomial distribution, 
shifted to have expectation Ai (ignoring rounding problems) can be written 
in a way similar to (13. 2h : see (13. 3ft below. In the following lemma we give the 
basic arguments how we can handle expressions of type (|3,2p in the discrete 
case for sums of independent indicators where all the involved parameters are 
allowed to be continous. We will deal with the rounding problems in the main 
proof of Theorem 12.11 

We need some notation first. For any function g, define the operators 
A k g(w) := A k ~ 1 g(w + 1) — A k ~ 1 g(w) with A°g := g and Qg(w) := (g(w + 
1) + g(w))/2. Note that OA = AO. We introduce the operator O in order to 
present the Stein operator of the shifted binomial in a symmetrized form, so 
that the connection with (|3.2j) should become more apparent. For the choice 



p* = 1/2, the linear part in the Ag part will vanish, so that the operator 
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indeed becomes symmetric, hence corresponds to the symmetric distribution 
Bi(n*, 1/2) shifted by -n*/2. 

Lemma 3.1. Let W be defined as before and let 

B*g(w) := (n W + (5 " " Ai)) Ag(w) - (w - X 1 )@g(w). (3.3) 

Then, for n* and p* defined as in ()2,2p . we have for any bounded function 
g : T, R tfia£ 

m 

EF^^) = Y,(P* -Pi)phi^ 3 g(Wi) 

i=l 
^ m 

= ^2 £ PiPjQiQjiPi -Pj) 2 E,A 3 g(W ij ), 
where Wii=W - X t and W iA :=W -X { - X } . 

Proof. It is easy to prove that, for any bounded function h : % — ► R, the 
following identities hold: 

E[(X< = Wft E[Ah(Wi)], (3.4) 

E[fc(W) - e^(Wi)] = -(i -pi)EA/t(Wi), (3.5) 
E[h(W) - h(Wi)] = Pi~EAh(Wi). (3.6) 

In what follows summation is always assumed to range over % = l,...,m. 
Using first (|3.4|) and then f)3.5j) we obtain that 

E[(W - Ai)e^(W)] = ^(Xi - Pi )Qg(W) = ^^EAG^Wi) 

= ^p iqi EAg(W) + Y,PMk -Pi)EA 2 5 (Wi). 
Prom ()3.4p we also deduce that 

E[(W - Ai)A<7(W)] = Y,Pi^ A2 9(W.i). 
Combining these two identities and recalling that n*p*q* = Ai — A2, 
lSB*g(W) = Y,PiQi(Pi-P*)^A 2 g(W l ). 
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Applying (|3.6p and noting that Y^PiliiPi ~ P*) = proves the first equality. 
For the second equality, we proceed with 

m 
i=l 

j m 

= ^Y1 PiPjliQjiPi ~ Pj)EA 3 5 (Wi) 

= 2^^ PiPjQiqjiPi -pj)(piEA 3 g(Wi) - Pj EA 3 g(Wj)) 

- m 

= 2^5] PiPjQiqjiPi -Pj){piEA 3 g(W ij ) +p i p j ~EA 4 g(W ij ) 

= ^ PiPmAPi - PjfE^giWij). □ 

The following fact was used already in Rollin ( 20081 ) i mplicitly. W e give a 
quick proof here. It is a simple extension of the result in Ehm ( 199ll ). and is 
necessary, as W may have a larger support than Y. 

Lemma 3.2. Let AdTL and define the operator Bf{k) := p(n — k)f(k + 1) — 
qkf(k). Let f : 7L — » R 6e i/ie solution to 

Bf(k) = L keA -Bi(n,p){A} if0<k<n, (3.7) 

and Zei /(fe) = /or A; ^ {0, 1, . . . , n}. Then, with K as defined in (|2.4|) . 

||A/|| < K (3.8) 

Furthermore, if A = {k} for some k £ we also have 

< K. (3.9) 



Proof . Note that, for 1 < k < n, f(k) coincides with the definition in lEhm 
(|199ll ). who showed that 

sup |A/(fc)|< ? q <K. 
fce{l,...,n-i} (n + l)pq 

It remains to bound A/(0) = /(l) and A/(n) = —f(n) as obviously Af(k) = 
if k < or k > n + 1. 
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Let /i := Bi(n,p) be the binomial probability measure. Then, from I Barb our et al 
3. 18 

/(*) 



(jl992bl . p. 189) we have that, for 1 < k < n and where U\. := {0, 1, ... , k} 
/j{A n U k -i] - /i{A}^{[/ fc _i} 



kq/j,{k} 



(3.10) 



kqfi{k} 

From this we have that 



in particular for k = 1 



\f(k)\< 1 Z„ cl X , (3-H) 



1/(1)1 < (1 < K (3.12) 



For the corresponding bound at the upper boundary, we have again from 
Barbour et al.1 (|1992bl . p. 189) that we can also write 



/(*) 



(n - fc + l)M fc - 1} /o 13 x 

n [^gy - n ?7 fc c _iM^-i} 1 ' ' 

(n — k + l)pfi{k — 1} 



which, applying it for k = n, leads to the same bound on Af (ri), so that 



3.8p follows. The bound on (|3.9p is immediate from the proof of lEhml (|1991 



Lemma 1). □ 

Proof of Theorem POl We need to bound | P[W — s £ A] — Bi(n,p){A}\ for 
any set A C Z. Let / : Z — > R be such that (|3.7p holds. Then we can write 

P[W-seA]-Bi(n,p){A} 

= P[W - s £ A \ {0, 1, . . . , n}} + TEBf(W - s) 

and note that this equation holds because / = outside of {0, 1, . . . , n}. 

Let the operator B* be defined as B in Lemma 13.21 but replacing n by n* 
and p by p*, respectively. Let g(w) := f(w — s) and recall that w — s = 
w — Ai + n*p* + {s*}. Then, 

Bf{w -s)= B*f{w -s) + {s*}g(w + 1) + (p* - p){w - s)Ag(w) 
=: B*f(w-s)+R 1 {w). 
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Note further that 
B*f(w-s) 

= (n*p*q* — p*(w — s — n*p*))Af(w — s) — (w — s 
= B*g(w) - p*{s*}Ag(w) - {s*}g(w) 
= :B*g(w)+R 2 (w), 

where B* is as in Lemma 13,11 Hence, 

Bf(w -s) = B*g(w) + i?i(u>) + R 2 (w) 



n*p*)f(w 



(3.15) 



(3.16) 



Let us first deal with the error terms R\ and R 2 (which arise only due to the 
necessity that n and s have to be integers). Now, 

R 1 (w)+R 2 (w) = (p* - p)wAg(w) + ({s*}(l - p*) - s(p* - p))Ag(w). 

Noting that E[^Aj(W)] = ^PiEAgiWi + 1) and recalling (|3T8]), we have 

|E[i?i(W) + R 2 (W)]\ < 2K(\ 1 \p*-p\ + {s*}) 

<2K(X 1 ({n*} + {s*})/n + {s*}) 

where we use 

|{ S *}(1 - P *) - S( P * - P )\ = \S*(1 - P *) - 8(1 - P )\ 

< \s — s*\ + \ sp — s*p*\ 

< {s*} + s*\p — p*\ + \s — s*\p 

< 2{s*} + s*\p-p*\ 
<2{s*} + X 1 \p-p*\. 

To estimate 1EB*(W) we use Lemmal3.11 Estim a tion o f E,A 3 g(Wi t j) goes 
along the lines given in iBarbour and Cekanaviciusl (|2002l . p. 521 and 541). 
For a random variable X, define first 



D k (X) = \\S?(X) * (6 - 5 1 



where || • || denotes the total variation norm when applied to measures. Note 
that D l (X) = 2d TV (if (X), JSf (X+l)). We can decompose W i:j = S iJ:1 +S iJ:2 
in such a way, that both sums of the (pi A qi) corresponding to Sij t i and Sij t2 
are greater or equal to v/2 — v*, where v* = maxi<j< m (j)j A q{). We have 



|EA 3 5 (^)| < HA^ID 2 ^) < \\Ag\\D l (S l ^ l )D l (S i ^ 2 ) 
4K 



< 



(3.17) 



1 V (v/2 - 1) 
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In the last line we used Barbour and Xia ( 19991 . Proposition 4.6) and Barbour and Cekanavicius 
(|2002l . p. 521, Estimate (4.9)). 

So, starting from (|3. 14[) . then using identity (|3. 15|) along with Lemma 13.11 
and estimate ([3.17P and also estimate (|3.16p . we obtain 

\P[W -s e A] — Bi(n,p){A}\ 

" 2cr»(lV(t,/2-l)) LftPi«<&(ft " Pi) 

+ 2if(Ai({n*} + {s*})/n + {s*}) + P[W < s] + P[W > n + a]. 
Note now that 

2 ^PiPjQiQjiPi ~ Pj) 2 = (Ai - A 2 )(A 3 - A 4 ) - (A 2 - A 3 ) 2 . 

Consequently, to complete the proof for the total variation distance one 
needs to estimate tails of W. Note that Xi — pi satisfies Bernstein's inequality 
with parameter r = 1. Therefore, 

4 

P(W <s) = P(W - Ai < 8 - Ai) < exp{- * } < exp{-<7 2 /4}. 

Similarly, by applying estimate 

2 

F(VT-Ai>x)<exp{^-|}, 

see equation (4.3) from Arak and Zaitsev ( 19881 ). we get 

P(W > n + s) < exp{-cr 2 /4 + 1}. (3.18) 

Estimate F(W < s) < smaxi < spt is straightforward. 

To obtain result for the <ii oc metric, the proof is similar, except that we 
now have A = k for some k E % and bound (|3.9p . We need some refinements 
of the estimates of E[12i(W) + i2 2 (W)] and EB*(W). Similar to (IBTTTD . 

|EA^)| < < {ly{ l K l))1/2 

and, choosing #ij,fc, k = 1,2,3, so that the corresponding {pi A qi) sum up to 
at least (i>/3 — 2u*), 

|EA 3 9 (^-)| < Ibll^ 3 ^) < Ibll n^(^) ^ (1 v(X-2)) 3 / 2 ' 

(3.19) 

Plugging these estimates into the corresponding inequalities, the final estimate 
is easily obtained. □ 
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4 NUMERICAL RESULTS 



In this section we study the sum of Bernoulli random variables X\ , . . . , X\qq 
with uniformly spread probabilities from to some parameter M, so that 
Pi = zM/(101), i = 1,2,... ,100. We analytically compute the exact distri- 
bution of W = X^=i Xi and then the exact total variation distance between 
W and several different approximations for different values of M . Figure [1] 
shows a graph of the exact total variation approximation error for several dif- 
ferent approximations versus M, referred to in the graph on the X-axis as 
the "maximum probability." In the graph "Poisson" is the standard Pois- 
son approximation where the parameter is chosen to match the first moment. 
"Binomial" refers to a binomial approximation where a number of trials n is 
fixed to equal 100 but the probability of succ ess p is chos en to match the first 
moment (this is the approximation studied in lEhml (|199lh ). "Shifted Poisson" 
refers to the approximation where a constant is added to a Poisson random 
variable and the two parameters - the constant and the Poisson rate - are 
chosen to match the first t wo m oments (this is the approximation studied in 
Cekanavicius and Vaitkusl ( 2001 )). The "Normal" approximation is the stan- 



dard normal approximation to the binomial distribution using the continuity 
correction. "2 parameter binomial" refers to the approximation where the 
two binomial parameters n and p ar e chosen to match the first two moments 
(this is the approximation studied in lSoon (|l996h). Finally, "shifted binomial" 
refers to the approximation we propose in this paper - where the shift, the 
number of trials, and the probability of success are chosen to match the first 
three moments. 

We see in Figure Q] that the normal approximation performs well when 
probabilities are widely spread out but performs very poorly when probabil- 
ities are very small. We see that the Poisson, shifted Poisson, and binomial 
approximations are best for small probabilities but not otherwise. The two 
parameter binomial approximation is quite good, but the shifted binomial ap- 
proximation performs the best over the widest range of values of M. Since 
the value of M can be viewed as varying widely in our statistical application, 
this would be the preferred approximation. 

In summary, we see that over a range of different Poisson-Binomial ran- 
dom variables that the shifted binomial approximation performs very well - 
usually better than the other two standard binomial approximations studied 
previously in the literature. The advantage of the shifted binomial approx- 
imation seems to increase as the spread among the Bernoulli probabilities 
increases. 
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Figure 1: Exact total variation distance error between W using pi = iM/ 101, 
i = 1, . . . , 100, and six approximations as a function of the maximum proba- 
bility M. 



14 



5 APPLICATION TO BAYESIAN HIERARCHICAL MODELING 



The study o f shifted binomial approximations is motivated by a statistical 
problem (see Pekoz. Shwartz. Christiansen, and Berlowitz (2009)) of ranking 



a large number of hospitals with respect to quality as measured by the risk of 
adverse events at the hospitals. Let X^ be a binary data variable that equals 
1 if adverse event of a particular type happens to patient i in hospital facility 
j, and equals zero otherwise. We are interested in the following model where 
are the data values, pij are known constants, and 9j, and a 2 are unknown 
parameters that we would like to estimate: 

X^ \ pij,0j ~ Be(logit _1 (logit(py) + %)) 

where 



'3 



\a 2 ~ N(0,cj 2 ) 



In this model is a risk-adjusted probability that has been previously calcu- 
lated by taking into account various patient specific indicators and it represents 
the chance patient i would have an adverse event at a typical hospital. The 
parameter 9j is a hospital specific factor that increases or decreases the prob- 
ability of an adverse event for its patients. Hospitals with a high value of 6j 
are poorly performing hospitals. Our goal is to rank hospitals by the values 
of 0j. The standard Bayesian hierarchical modeling approach is to put prior 
distributions on the unspecified parameters and estimate the posterior means 
of all the parameters conditional on the data. 

The difficulty in this situation is that the values of Xij and are both 
confidential and are too numerous to conveniently transmit from each of the 
hospitals to the main research facility that would be performing the analysis. 
We need a method for summarizing each of these so that each facility only 
needs to report a few summary statistics. In our application we have thousands 
of hospitals, thousands of people in each hospital and a number of different 
types of adverse events. A rough approximation of 5, 000 hospitals with 1, 000 
people each yields a total of 5, 000 x 1, 000 = 5, 000, 000 random variables — too 
many to be conveniently computable by standard software. 

To circumvent this difficulty we propose that each hospital aggregate its 
patients and compute Yj = Yli-^-iji the number of people in hospital j who 
have an adverse event. We then use the shifted binomial approximation above 
for Yj. This will then yield a total of 5,000 random variables — much more 
easily manageable computationally. 

To implement the approximation, in the preparation stage, hospital j also 
stores and submits the values of Xj m = YliPij f° r m = 1> 2, 3 and all j. Then 
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we can easily compute the shifted binomial approximation to Yj from these as 
a function of 9j. This results in the following model: 




r 2 ~ N(0,CJ 2 ), 
Bi(n J -,logit- 1 (logit(p j ) + %)) 



Yj Sj | 8j , n j , pj , 



with 



Pj = 



Aj2 ~ Aj 3 



Aji ~ Aj2 

Pj{ i -Pjy 



Sj = Xji- rijpj 



being the parameters for the shifted binomial approximation designed to match 
up three moments. 

Remark 5.1. Though the binomial distribution is not defined for fractional 
values of the parameter n, we can use a fractional parameter in the likelihood 
function for the data to obtain in some sense an interpolation of the likelihood 
functions under the two closest binomial models having integer parameters. 
For many statistical parameter estimation software packages using likelihood- 
based approaches, such as maximum likelihood or the Metropolis algorithm, 
such fractional values of the binomial parameter n can be used this way to 
yield better approximations. 

For example in the simple model for the data X\n,p ~ Bi(n,p), the like- 
lihood function for the data as a function of the unknown parameter p is 
L(p) oc p x (l — p) n ~ x . Under likelihood-based approaches this function is all 
that is used from the model to estimate the parameters, and so the use of 
non-integer n the function L(p) can be viewed as yielding an interpolation of 
the likelihood functions L\(p) oc p x (l —p)\ n ~\~ x and ^(p) oc p x (1 — p)L n J _x . 
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